17 research outputs found

    Associative conceptual space-based information retrieval systems

    Get PDF
    In this `Information Era' with the availability of large collections of books, articles, journals, CD-ROMs, video films and so on, there exists an increasing need for intelligent information retrieval systems that enable users to find the information desired easily. Many attempts have been made to construct such retrieval systems, including the electronic ones used in libraries and including the search engines for the World Wide Web. In many cases, however, the so-called `precision' and `recall' of these systems leave much to be desired. In this paper, a new AI-based retrieval system is proposed, inspired by, among other things, the WEBSOM-algorithm. However, contrary to that approach where domain knowledge is extracted from the full text of all books, we propose a system where certain specific meta-information is automatically assembled using only the index of every document. This knowledge extraction process results into a new type of concept space, the so-called Associative Conceptual Space where the `concepts' as found in all documents are clustered using a Hebbian-type of learning algorithm. Then, each document can be characterised by comparing the concepts as occurring in it to those present in the associative conceptual space. Applying these characterisations, all documents can be clustered such that semantically similar documents lie close together on a Self-Organising Map. This map can easily be inspected by its user

    ContextD: An algorithm to identify contextual properties of medical terms in a dutch clinical corpus

    Get PDF
    Background: In order to extract meaningful information from electronic medical records, such as signs and symptoms, diagnoses, and treatments, it is important to take into account the contextual properties of the identified information: negation, temporality, and experiencer. Most work on automatic identification of these contextual properties has been done on English clinical text. This study presents ContextD, an adaptation of the English ConText algorithm to the Dutch language, and a Dutch clinical corpus. Results: The ContextD algorithm utilized 41 unique triggers to identify the contextual properties in the clinical corpus. For the negation property, the algorithm obtained an F-score from 87% to 93% for the different document types. For the experiencer property, the F-score was 99% to 100%. For the historical and hypothetical values of the temporality property, F-scores ranged from 26% to 54% and from 13% to 44%, respectively. Conclusions: The ContextD showed good performance in identifying negation and experiencer property values across all Dutch clinical document types. Accurate identification of the temporality property proved to be difficult and requires further work. The anonymized and annotated Dutch clinical corpus can serve as a useful resource for further algorithm development

    Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data

    Get PDF
    Objective: To develop a conceptual prediction model framework containing standardized steps and describe the corresponding open-source software developed to consistently implement the framework across computational environments and observational healthcare databases to enable model sharing and reproducibility. Methods: Based on existing best practices we propose a 5 step standardized framework for: (1) transparently defining the problem; (2) selecting suitable datasets; (3) constructing variables from the observational data; (4) learning the predictive model; and (5) validating the model performance. We implemented this framework as open-source software utilizing the Observational Medical Outcomes Partnership Common Data Model to enable convenient sharing of models and reproduction of model evaluation across multiple observational datasets. The software implementation contains default covariates and classifiers but the framework enables customization and extension. Results: As a proof-of-concept, demonstrating the transparency and ease of model dissemination using the software, we developed prediction models for 21 different outcomes within a target population of people suffering from depression across 4 observational databases. All 84 models are available in an accessible online repository to be implemented by anyone with access to an observational database in the Common DataModel format. Conclusions: The proof-of-concept study illustrates the framework's ability to develop reproducible models that can be readily shared and offers the potential to perform extensive external validation of models, and improve their likelihood of clinical uptake. In future work the framework will be applied to perform an "all-by-all" prediction analysis to assess the observational data prediction domain across numerous target populations, outcomes and time, and risk settings

    Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records

    Get PDF
    Background: Distinguishing cases from non-cases in free-text electronic medical records is an important initial step in observational epidemiological studies, but manual record validation is time-consuming and cumbersome. We compared different approaches to develop an automatic case identification system with high sensitivity to assist manual annotators. Methods. We used four different machine-learning algorithms to build case identification systems for two data sets, one comprising hepatobiliary disease patients, the other acute renal failure patients. To improve the sensitivity of the systems, we varied the imbalance ratio between positive cases and negative cases using under- and over-sampling techniques, and applied cost-sensitive learning with various misclassification costs. Results: For the hepatobiliary data set, we obtained a high sensitivity of 0.95 (on a par with manual annotators, as compared to 0.91 for a baseline classifier) with specificity 0.56. For the acute renal failure data set, sensitivity increased from 0.69 to 0.89, with specificity 0.59. Performance differences between the various machine-learning algorithms were not large. Classifiers performed best when trained on data sets with imbalance ratio below 10. Conclusions: We were able to achieve high sensitivity with moderate specificity for automatic case identification on two data sets of electronic medical records. Such a high-sensitive case identification system can be used as a pre-filter to significantly reduce the burden of manual record validation

    Risk of atrial fibrillation among bisphosphonate users: a multicenter, population-based, Italian study

    Get PDF
    Summary: Bisphosphonate treatment is used to prevent bone fractures. A controversial association of bisphosphonate use and risk of atrial fibrillation has been reported. In our study, current alendronate users were associated with a higher risk of atrial fibrillation as compared with those who had stopped bisphosphonate (BP) therapy for more than 1 year.Introduction: Bisphosphonates are widely used to prevent bone fractures. Controversial findings regarding the association between bisphosphonate use and the risk of atrial fibrillation (AF) have been reported. The aim of this study was to evaluate the risk of AF in association with BP exposure.Methods: We performed a nested case-control study using the databases of drug-dispensing and hospital discharge diagnoses from five Italian regions. The data cover a period ranging from July 1, 2003 to December 31, 2006. The study population comprised new users of bisphosphonates aged 55 years and older. Patients were followed from the first BP prescription until an occurrence of an AF diagnosis (index date, i.e., ID), cancer, death, or the end of the study period, whichever came first. For the risk estimation, any AF case was matched by age and sex to up to 10 controls from the same source population. A conditional logistic regression was performed to obtain the odds ratio with 95 % confidence intervals (CI). The BP exposure was classified into current (<90 days prior to ID), recent (91–180), past (181–364), and distant past (≥365) use, with the latter category being used as a reference point. A subgroup analysis by individual BP was then carried out.Results: In comparison with distant past users of BP, current users of BP showed an almost twofold increased risk of AF: odds ratio (OR) = 1.78 and 95 % CI = 1.46–2.16. Specifically, alendronate users were mostly associated with AF as compared with distant past use of BP (OR, 1.97; 95 % CI, 1.59–2.43).Conclusion: In our nested case-control study, current users of BP are associated with a higher risk of atrial fibrillation as compared with those who had stopped BP treatment for more than 1 year

    Population-based analysis of non-steroidal anti-inflammatory drug use among children in four European countries in the SOS project: What size of data platforms and which study designs do we need to assess safety issues?

    Get PDF
    Background: Data on utilization patterns and safety of non-steroidal anti-inflammatory drugs (NSAIDs) in children are scarce. The purpose of this study was to investigate the utilization of NSAIDs among children in four European countries as part of the Safety Of non-Steroidal anti-inflammatory drugs (SOS) project.Methods: We used longitudinal patient data from seven databases (GePaRD, IPCI, OSSIFF, Pedianet, PHARMO, SISR, and THIN) to calculate prevalence rates of NSAID use among children (0-18 years of age) from Germany, Italy, Netherlands, and United Kingdom. All databases contained a representative population sample and recorded demographics, diagnoses, and drug prescriptions. Prevalence rates of NSAID use were stratified by age, sex, and calendar time. The person-time of NSAID exposure was calculated by using the duration of the prescription supply. We calculated incidence rates for serious adverse events of interest. For these adverse events of interest, sample size calculations were conducted (alpha = 0.05; 1-beta = 0.8) to determine the amount of NSAID exposure time that would be required for safety studies in children.Results: The source population comprised 7.7 million children with a total of 29.6 million person-years of observation. Of those, 1.3 million children were exposed to at least one of 45 NSAIDs during observation time. Overall prevalence rates of NSAID use in children differed across countries, ranging from 4.4 (Italy) to 197 (Germany) per 1000 person-years in 2007. For Germany, United Kingdom, and Italian pediatricians, we observed high rates of NSAID use among children aged one to four years. For all four countries, NSAID use increased with older age categories for children older than 11. In this analysis, only for ibuprofen (the most frequently used NSAID), enough exposure was available to detect a weak association (relative risk of 2) between exposure and asthma exacerbation (the most common serious adverse event of interest).Conclusions: Patterns of NSAID use in children were heterogeneous across four European countries. The SOS project platform captures data on more than 1.3 million children who were exposed to NSAIDs. Even larger data platforms and the use of advanced versions of case-only study designs may be needed to conclusively assess the safety of these drugs in children

    The implicitome: A resource for rationalizing gene-disease associations

    Get PDF
    High-throughput experimental methods such as medical sequencing and genome-wide association studies (GWAS) identify increasingly large numbers of potential relations between genetic variants and diseases. Both biological complexity (millions of potential gene-disease associations) and the accelerating rate of data production necessitate computational approaches to prioritize and rationalize potential gene-disease relations. Here, we use concept profile technology to expose from the biomedical literature both explicitly stated gene-disease relations (the explicitome) and a much larger set of implied gene-disease associations (the implicitome). Implicit relations are largely unknown to, or are even unintended by the original authors, but they vastly extend the reach of existing
    corecore